AITopics | soft target

Collaborating Authors

soft target

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification

Neural Information Processing SystemsApr-25-2026, 20:33:08 GMT

Semi-supervised learning aims to leverage a large amount of unlabeled data for performance boosting. Existing works primarily focus on image classification. In this paper, we delve into semi-supervised learning for object detection, where labeled data are more labor-intensive to collect. Current methods are easily distracted by noisy regions generated by pseudo labels. To combat the noisy labeling, we propose noise-resistant semi-supervised learning by quantifying the region uncertainty. We first investigate the adverse effects brought by different forms of noise associated with pseudo labels. Then we propose to quantify the uncertainty of regions by identifying the noise-resistant properties of regions over different strengths. By importing the region uncertainty quantification and promoting multipeak probability distribution output, we introduce uncertainty into training and further achieve noise-resistant learning. Experiments on both PASCALVOC and MSCOCO demonstrate the extraordinary performance of our method.

artificial intelligence, machine learning, proposal, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Combating Noise: Semi-supervisedLearningby RegionUncertaintyQuantification

Neural Information Processing SystemsFeb-8-2026, 15:15:09 GMT

artificial intelligence, machine learning, proposal, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MEDUSA: A Multimodal Deep Fusion Multi-Stage Training Framework for Speech Emotion Recognition in Naturalistic Conditions

Chatzichristodoulou, Georgios, Kosmopoulou, Despoina, Kritikos, Antonios, Poulopoulou, Anastasia, Georgiou, Efthymios, Katsamanis, Athanasios, Katsouros, Vassilis, Potamianos, Alexandros

arXiv.org Artificial IntelligenceSep-5-2025

SER is a challenging task due to the subjective nature of human emotions and their uneven representation under naturalistic conditions. We propose MEDUSA, a multimodal framework with a four-stage training pipeline, which effectively handles class imbalance and emotion ambiguity. The first two stages train an ensemble of classifiers that utilize DeepSER, a novel extension of a deep cross-modal transformer fusion mechanism from pretrained self-supervised acoustic and linguistic representations. Manifold MixUp is employed for further regularization. The last two stages optimize a trainable meta-classifier that combines the ensemble predictions. Our training approach incorporates human annotation scores as soft targets, coupled with balanced data sampling and multitask learning. MEDUSA ranked 1st in Task 1: Categorical Emotion Recognition in the Interspeech 2025: Speech Emotion Recognition in Naturalistic Conditions Challenge.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.09556

Country:

Europe (1.00)
North America > United States (0.98)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations

Matsumura, Naoki, Yoshimoto, Yuta, Iwasaki, Yuto, Yamazaki, Meguru, Sakai, Yasufumi

arXiv.org Artificial IntelligenceJun-23-2025

Neural network potentials (NNPs) offer a powerful alternative to traditional force fields for molecular dynamics (MD) simulations. Accurate and stable MD simulations, crucial for evaluating material properties, require training data encompassing both low-energy stable structures and high-energy structures. Conventional knowledge distillation (KD) methods fine-tune a pre-trained NNP as a teacher model to generate training data for a student model. However, in material-specific models, this fine-tuning process increases energy barriers, making it difficult to create training data containing high-energy structures. To address this, we propose a novel KD framework that leverages a non-fine-tuned, off-the-shelf pre-trained NNP as a teacher. Its gentler energy landscape facilitates the exploration of a wider range of structures, including the high-energy structures crucial for stable MD simulations. Our framework employs a two-stage training process: first, the student NNP is trained with a dataset generated by the off-the-shelf teacher; then, it is fine-tuned with a smaller, high-accuracy density functional theory (DFT) dataset. We demonstrate the effectiveness of our framework by applying it to both organic (polyethylene glycol) and inorganic (L$_{10}$GeP$_{2}$S$_{12}$) materials, achieving comparable or superior accuracy in reproducing physical properties compared to existing methods. Importantly, our method reduces the number of expensive DFT calculations by 10x compared to existing NNP generation methods, without sacrificing accuracy. Furthermore, the resulting student NNP achieves up to 106x speedup in inference compared to the teacher NNP, enabling significantly faster and more efficient MD simulations.

artificial intelligence, machine learning, teacher model, (15 more...)

arXiv.org Artificial Intelligence

2506.15337

Country: Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (1.00)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Equally Critical: Samples, Targets, and Their Mappings in Datasets

Yang, Runkang, Sun, Peng, Shang, Xinyi, Tang, Yi, Lin, Tao

arXiv.org Artificial IntelligenceJun-4-2025

Data inherently possesses dual attributes: samples and targets. For targets, knowledge distillation has been widely employed to accelerate model convergence, primarily relying on teacher-generated soft target supervision. Conversely, recent advancements in data-efficient learning have emphasized sample optimization techniques, such as dataset distillation, while neglected the critical role of target. This dichotomy motivates our investigation into understanding how both sample and target collectively influence training dynamic. To address this gap, we first establish a taxonomy of existing paradigms through the lens of sample-target interactions, categorizing them into distinct sample-to-target mapping strategies. Building upon this foundation, we then propose a novel unified loss framework to assess their impact on training efficiency. Through extensive empirical studies on our proposed strategies, we comprehensively analyze how variations in target and sample types, quantities, and qualities influence model training, providing six key insights to enhance training efficacy.

machine learning, natural language, teacher model, (19 more...)

arXiv.org Artificial Intelligence

2506.01987

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.97)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)

Add feedback

Improving Multimodal Large Language Models Using Continual Learning

Srivastava, Shikhar, Harun, Md Yousuf, Shrestha, Robik, Kanan, Christopher

arXiv.org Artificial IntelligenceOct-25-2024

Generative large language models (LLMs) exhibit impressive capabilities, which can be further augmented by integrating a pre-trained vision model into the original LLM to create a multimodal LLM (MLLM). However, this integration often significantly decreases performance on natural language understanding and generation tasks, compared to the original LLM. This study investigates this issue using the LLaVA MLLM, treating the integration as a continual learning problem. We evaluate five continual learning methods to mitigate forgetting and identify a technique that enhances visual understanding while minimizing linguistic performance loss. Our approach reduces linguistic performance degradation by up to 15% over the LLaVA recipe, while maintaining high multimodal accuracy. We also demonstrate the robustness of our method through continual learning on a sequence of vision-language tasks, effectively preserving linguistic skills while acquiring new multimodal capabilities. Figure 1: Summary results of the best CL methods we evaluated for training LLaVA 1.5 compared to the unimodal base LLM and the original version of LLaVA 1.5. All results are with Pythia 2.8B as the base LLM. The best method has almost the same vision-language (VL) accuracy while providing a large increase in linguistic performance on 1 NLG and 4 NLU tasks by 8% and 2% (absolute), resp.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.19925

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Deep End-to-End Survival Analysis with Temporal Consistency

Vieyra, Mariana Vargas, Frossard, Pascal

arXiv.org Artificial IntelligenceOct-9-2024

In this study, we present a novel Survival Analysis algorithm designed to efficiently handle large-scale longitudinal data. Our approach draws inspiration from Reinforcement Learning principles, particularly the Deep Q-Network paradigm, extending Temporal Learning concepts to Survival Regression. A central idea in our method is temporal consistency, a hypothesis that past and future outcomes in the data evolve smoothly over time. Our framework uniquely incorporates temporal consistency into large datasets by providing a stable training signal that captures long-term temporal relationships and ensures reliable updates. Additionally, the method supports arbitrarily complex architectures, enabling the modeling of intricate temporal dependencies, and allows for end-to-end training. Through numerous experiments we provide empirical evidence demonstrating our framework's ability to exploit temporal consistency across datasets of varying sizes. Moreover, our algorithm outperforms benchmarks on datasets with long sequences, demonstrating its ability to capture long-term patterns. Finally, ablation studies show how our method enhances training stability.

dataset, sequence, target network, (15 more...)

arXiv.org Artificial Intelligence

2410.06786

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech

Kang, Jiawen, Han, Dongrui, Meng, Lingwei, Zhou, Jingyan, Li, Jinchao, Wu, Xixin, Meng, Helen

arXiv.org Artificial IntelligenceSep-21-2024

Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models to distinguish between individuals with AD and those without. Unlike conventional classification tasks, we identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments. Given that many AD detection tasks lack fine-grained labels, simplistic binary classification may overlook two crucial aspects: within-class differences and instance-level imbalance. The former compels the model to map AD samples with varying degrees of impairment to a single diagnostic label, disregarding certain changes in cognitive function. While the latter biases the model towards overrepresented severity levels. This work presents early efforts to address these challenges. We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively. Experiments on the ADReSS and ADReSSo datasets demonstrate that the proposed methods significantly improve detection accuracy. Further analysis reveals that SoTD effectively harnesses the strengths of multiple component models, while InRe substantially alleviates model over-fitting. These findings provide insights for developing more robust and reliable AD detection models.

classifier, detection, estimation, (16 more...)

arXiv.org Artificial Intelligence

2409.16322

Country: Asia > China > Hong Kong (0.05)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations

Baharoon, Mohammed, Klein, Jonathan, Michels, Dominik L.

arXiv.org Artificial IntelligenceMay-23-2024

Vision-language contrastive learning frameworks like CLIP enable learning representations from natural language supervision, and provide strong zero-shot classification capabilities. However, due to the nature of the supervisory signal in these paradigms, they lack the ability to learn localized features, leading to degraded performance on dense prediction tasks like segmentation and detection. On the other hand, self-supervised learning methods have shown the ability to learn granular representations, complementing the high-level features in vision-language training. In this work, we present Harmony, a framework that combines vision-language training with discriminative and generative self-supervision to learn visual features that can be generalized across vision downstream tasks. Our framework is specifically designed to work on web-scraped data by not relying on negative examples and addressing the one-to-one correspondence issue using soft CLIP targets generated by an EMA model. We comprehensively evaluate Harmony across various vision downstream tasks and find that it significantly outperforms the baseline CLIP and the previously leading joint self and weakly-supervised methods, MaskCLIP and SLIP. Specifically, when comparing against these methods, Harmony shows superior performance in fine-tuning and zero-shot classification on ImageNet-1k, semantic segmentation on ADE20K, and both object detection and instance segmentation on MS-COCO, when pre-training a ViT-S/16 on CC3M. We also show that Harmony outperforms other self-supervised learning methods like iBOT and MAE across all tasks evaluated. On https://github.com/MohammedSB/Harmony our code is publicly available.

encoder, objective, representation, (17 more...)

arXiv.org Artificial Intelligence

2405.14239

Country: North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

LightHGNN: Distilling Hypergraph Neural Networks into MLPs for $100\times$ Faster Inference

Feng, Yifan, Luo, Yihe, Ying, Shihui, Gao, Yue

arXiv.org Artificial IntelligenceFeb-6-2024

Hypergraph Neural Networks (HGNNs) have recently attracted much attention and exhibited satisfactory performance due to their superiority in high-order correlation modeling. However, it is noticed that the high-order modeling capability of hypergraph also brings increased computation complexity, which hinders its practical industrial deployment. In practice, we find that one key barrier to the efficient deployment of HGNNs is the high-order structural dependencies during inference. In this paper, we propose to bridge the gap between the HGNNs and inference-efficient Multi-Layer Perceptron (MLPs) to eliminate the hypergraph dependency of HGNNs and thus reduce computational complexity as well as improve inference speed. Experiments on eight hypergraph datasets demonstrate that even without hypergraph dependency, the proposed LightHGNNs can still achieve competitive or even better performance than HGNNs and outperform vanilla MLPs by 16.3 on average. Extensive experiments on three graph datasets further show the average best performance of our LightHGNNs compared with all other methods. Experiments on synthetic hypergraphs with 5.5w vertices indicate LightHGNNs can run 100 faster than HGNNs, showcasing their ability for latency-sensitive deployments. Compared to the graph with pair-wise correlation, the hypergraph is composed of degree-free hyperedges, which have an inherent superior modeling ability to represent those more complex high-order correlations. However, for large-scale industrial applications, especially for those big-data, small-memory, and high-speed demand environments, the Multi-Layer Perceptrons (MLPs) remain the primary workhorse. The main reason for such an academic-industrial gap for HGNNs is the dependence on the hypergraph structure in inference, which requires large memories in practice.

dataset, hyperedge, vertex, (16 more...)

arXiv.org Artificial Intelligence

2402.04296

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.94)

Add feedback